Importing Libraries

Exploratory Data Analysis

Data Visualisation

Data Preprocessing

Treating with Null values

Encoding

Feature Selection

Prediction Using Decision Tree Regression

We got 100% score on training data.

On test data we got 76.5% score because we did not provide any tuning parameters while intializing the tree as a result of which algorithm split the training data till the leaf node. Due to which depth of tree increased and our model did the overfitting.

That's why we are getting high score on our training data and less score on test data.

So to solve this problem we would use hyper parameter tuning.

We can use GridSearch or RandomizedSearch for hyper parameters tuning.

We are getting nearly bell shape curve that means our model working good? No we can't make that conclusion. Good bell curve only tell us the range of predicted values are with in the same range as our original data range values are.

Hyper Parameter tuning

Above we intialized hyperparmeters random range using Gridsearch to find the best parameters for our decision tree model.

Hyper parameter tuning took around 13 minues. It might vary depending upon your machine.

Training Decision Tree With Best Hyperparameter

Ok the above scatter plot looks lot better.

Let us compare now Error rate of our model with hyper tuning of paramerters to our original model which is without the tuning of parameters.

Conclusion

If you observe the above metrics for both the models, We got good metric values(MSE 80750526.31578948) and 76.5% score without hyperparameter tuning model compare to model hyper parameter tuning.